Orthogonal subsampling for big data linear regression

نویسندگان

چکیده

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from is crucial step in big-data We propose an orthogonal subsampling (OSS) approach for with focus on linear regression models. inspired by the fact array two levels provides best experimental design models sense it minimizes average variance estimated parameters predictions. merits OSS are three-fold: (i) easy implement fast; (ii) suitable distributed parallel computing ensures subsamples selected different batches have no common points, (iii) outperforms existing methods minimizing mean squared errors maximizing efficiencies subsamples. Theoretical results extensive numerical show superior approaches. It also more robust presence interactions among covariates, and, when they do exist, precise estimates interaction effects than methods. advantages illustrated through analysis real data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear grouping using orthogonal regression

This paper proposes a new method, called linear grouping algorithm (LGA), to detect different linear structures in a data set. LGA is useful for investigating potential linear patterns in datasets, that is, subsets that follow different linear relationships. LGA combines ideas from principal components, clustering methods and resampling algorithms. It can detect several different linear relatio...

متن کامل

Data Depth for Classical and Orthogonal Regression

We present a comparison of different depth notions which are appropriate for classical and orthogonal regression with and without intercept. We consider the global depth and tangential depth introduced by Mizera (2002) and the simplicial depth studied for regression in detail at first by Müller (2005). The global depth and the tangential depth are based on quality functions. These quality funct...

متن کامل

Fast Gaussian Process Regression for Big Data

Gaussian Processes are widely used for regression tasks. A known limitation in the application of Gaussian Processes to regression tasks is that the computation of the solution requires performing a matrix inversion. The solution also requires the storage of a large matrix in memory. These factors restrict the application of Gaussian Process regression to small and moderate size data sets. We p...

متن کامل

Parametric Gaussian Process Regression for Big Data

This work introduces the concept of parametric Gaussian processes (PGPs), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. Parametric Gaussian processes, by construction, are designed to operate in “big data” regimes where one is interested in quantifying the uncertainty associated with noisy data. The proposed methodology circumvents the welles...

متن کامل

A Divided Regression Analysis for Big Data

Statistics is an important part in big data because many statistical methods are used for big data analysis. The aim of statistics is to estimate population using the sample extracted from the population, so statistics is to analyze not the population but the sample. But in big data environment, we can get the big data set closed to the population by the advanced computing systems such as cloud...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Annals of Applied Statistics

سال: 2021

ISSN: ['1941-7330', '1932-6157']

DOI: https://doi.org/10.1214/21-aoas1462